This script analyzes Origin-Destination (OD) data gathered from Twitter. The origin locations are collected within a given radius of a desired location and the destination locations are collected from the user timelines after an origin tweet has been observed. The OD data is cleaned based on certain characteristics so that, in the end, the OD Matrix consists of synchronous origin-destination tweet pairs which represent a trip taken by the user.
The model visualizes the OD data and calculates travel summary statistics including euclidean distance, route distance, time difference between tweets, estimated route time, and more. The model aggregates the destination data into Public Use Microdata Areas (PUMAs) within LA county to determine the density of destinations throughout the region. Once the PUMAs have been visualized, the model imports and analyzes demographic data, gathered from the 2017 American Community Survey Estimates, to visualize the characteristics of the PUMA regions. Finally, the model exports the PUMA shapefile, including its attributes, to run regression analysis in ArcGIS Pro.
The final results show how destination locations are concentrated and the demographic characteristics throughout the region. The results are used to gain a better understanding of where people are traveling and why they are traveling there. The results can be used to better understand the travel behaviors of individuals and cultivate interest and discussion surrounding efficient mobility solutions.
The following sections show the results of a model demonstration, the complete R workflow for the model, and a conclusion and next steps moving forward.
This model was initially constructed using tweets gathered from users in Los Angeles, California. The origin locations were considered as any tweet posted within two miles of LAX airport. The destination locations were considered as any subsequent tweet posted within Los Angeles county. The Twitter data extraction script was run using Python and the output was exported as .csv to be imported into this model. Twitter data was extracted for just under four days between April 18 and April 22, and a total of 1,439 origin tweets were observed. After data extraction, cleaning and processing, 136 OD pairs were created. The OD pairs represent users who tweeted once within two miles of LAX and again anywhere within LA County.
The results below show travel summary statistics based on the 136 OD pairs as well as a map of destination densities with Los Angeles. Additionally, the second map shows demographic characteristics by PUMA region for Los Angeles county. Given the short data extraction time period and subsequently low number of OD pairs observed, the initial dataset was not sufficient to run regression analysis. However, the model still includes the export features workflow to gather the information necessary to run regression. Moving forward, the Python script that was used to gather the initial demonstration data is currently running to obtain a larger dataset which can be used to create a more realistic model and gather greater insight using regression and other analytical methods.
This table shows travel summary statistics for the demonstration dataset. Google maps API was used to determine route distance and time. The pie chart shows OD pairs which were aggregated into 5 categories to show the distribution of destination distances.
| Value | |
|---|---|
| Average Euclidean Distance (mi) | 10.59 |
| Average time difference between tweets (min) | 389.91 |
| Average Route Distance (mi) | 14.35 |
| Average Route Time (min) | 25.64 |
The map below visualizes OD locations overlayed on a chloropleth of destination density per PUMA area, classified using the Jenks method. Hovering over the polygons shows the summary statistics unique to each area.
The map below visualizes OD locations overlayed on a chloropleth of, initially, population density per PUMA area, also classified using the Jenks method. Hovering over the polygon area shows destination count, as well as the values of every other demographic characteristic included in this model. The layers tab in the top corner of the map includes options to visualize the chloropleth map as each demographic characteristic.